Emotions in time domain synthesis
نویسندگان
چکیده
1.2 Resynthesis A preliminary test exploring 4 emotions showed that conveying emotions by time domain synthesis may be possible. Therefore, a more sophisticated test was carried out in order to determine the influence of the prosodic parameters in the perception of a speaker's emotional state. Six different emotional states were investigated. The stimuli of the second test were used in three different testing procedures: as natural speech, resynthesized and reduced to a sawtooth signal. The recognition rates were lower than in the preliminary test, although the differences between the recognition rates of natural and synthetic speech were comparable for both tests. The outcome of the sawtooth test showed that the amount of information about a speaker's emotional state transported by F , energy and overall duration is rather small. 0 However, we could determine relations between the acoustic prosodic parameters and the emotional content of speech. 1. MOTIVATION This study explores the possibility of simulating emotions in time domain speech synthesis. In earlier studies dealing with the acoustic-phonetic correlates of emotions (see e.g. Klasmeyer, 1995), voice quality-phenomena such as jitter or different modes of exitation have been found to be important factors . These phenomena cannot easily be controlled in time domain speech synthesis. However, it would be useful to be able to simulate emotions in order to make the synthesis sound more lively. The factors that can easily be manipulated in time domain speech synthesis are the prosodic parameters duration, fundamental frequency and energy. So the question about emotions in time domain synthesis can be reformulated as follows: How much information about the speaker's emotional state is conveyed by these three prosodic parameters? 1. PRELIMINARY EXPERIMENT 1.1 Natural Speech In a preliminary experiment, three emotionally neutral German sentences were chosen. The sentences were (There will be snow this weekend); (no) and (Tomorrow everthing will be different). They were uttered by three speakers in a neutral style, and simulating three different emotions: Joy, fear and anger. The recordings were done with a movable microphone held by the speaker in order to allow the subjects to gesticulate. The 36 stimuli were played to 8 subjects. They recognized the intended emotions in 82% of cases (chance level:25%; Chi square test: for all subjects p<0.05). Angry and neutral speech were recognized most reliably (see Figure 1). The speaker and the sentence with the lowest identification rates were excluded for the following experiment. The 16 remaining utterances were resynthesized by a time domain synthesis system (Portele et al., 1994) with the same prosodic features as the original utterances, using two different unit inventories for one male and one female. Durations and energy values were measured by hand; the pitch was determined automatically. (One difficulty were numerous overmodulations caused by the recording conditions; the pitch marks could not be set correctly so that a transfer of pitch contours was not always possible with the desired quality). The stimuli were played to 9 subjects. As expected, the classification was worse than for the natural speech: 55% correct (chance level: 25%; Chi square test: for 8 subjects p< 0.05). The emotions most often classified correctly were fear and neutral speech (see Figure 1). 1.3 Preliminary Conclusions Our hypothesis regarding the low recognition rates was that the poor recording conditions had influenced the pitch transfer and thus hindered recognition. This was supported by the fact that most subjects had difficulties recognizing joy, because this emotion was marked by a enhanced pitch range. Still, the results suggested that it should be possible to convey emotions in time domain synthesis without difficulty.
منابع مشابه
Explaining Post-Traumatic Growth: Thematic Synthesis of Qualitative Research
Objectives The present study aimed at employing a thematic synthesis approach to respond to this fundamental question: what is the post-traumatic growth process? Methods The current study was a thematic synthesis of qualitative papers on post traumatic growth. From 50 Studies about post-traumatic growth from 2007 to 2018, 18 papers met the inclusion criteria for Systematic review. Papers were ...
متن کاملAutomatic prediction of emotions from text in Spanish for expressive speech synthesis in the chat domain Predicción automática de emociones a partir de texto en español para síntesis de voz expresiva en el dominio del chat
This paper describes a module for the prediction of emotions in text chats in Spanish, oriented to its use in specific-domain text-to-speech systems. A general overview of the system is given, and the results of some evaluations carried out with two corpora of real chat messages are described. These results seem to indicate that this system offers a performance similar to other systems describe...
متن کاملDetection and Classification of Emotions Using Physiological Signals and Pattern Recognition Methods
Introduction: Emotions play an important role in health, communication, and interaction between humans. The ability to recognize the emotional status of people is an important indicator of health and natural relationships. In DEAP database, electroencephalogram (EEG) signals as well as environmental physiological signals related to 32 volunteers are registered. The participants in each video we...
متن کاملDetection and Classification of Emotions Using Physiological Signals and Pattern Recognition Methods
Introduction: Emotions play an important role in health, communication, and interaction between humans. The ability to recognize the emotional status of people is an important indicator of health and natural relationships. In DEAP database, electroencephalogram (EEG) signals as well as environmental physiological signals related to 32 volunteers are registered. The participants in each video we...
متن کاملInformatique Affective : Affichage, Reconnaissance, et Synthèse par Ordinateur des Émotions. (Affective Computing: Display, Recognition, and Computer Synthesis of Emotions)
Affective Computing refers to computing that relates to, arises from, or deliberately influences emotions and has is natural application domain in highly abstracted humancomputer interactions. Affective computing can be divided into three main parts, namely display, recognition, and synthesis. The design of intelligent machines able to create natural interactions with the users necessarily impl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996